Skip to content

Extend dCDH PSU-level wild bootstrap to cell granularity#332

Merged
igerber merged 16 commits intomainfrom
feature/dcdh-bootstrap-cell-period
Apr 19, 2026
Merged

Extend dCDH PSU-level wild bootstrap to cell granularity#332
igerber merged 16 commits intomainfrom
feature/dcdh-bootstrap-cell-period

Conversation

@igerber
Copy link
Copy Markdown
Owner

@igerber igerber commented Apr 19, 2026

Summary

  • Lift the last NotImplementedError gate in the dCDH survey contract: n_bootstrap > 0 combined with within-group-varying PSU is now supported via a cell-level Hall-Mammen wild multiplier bootstrap.
  • A dispatcher in _compute_dcdh_bootstrap detects PSU-within-group-constant regimes (PSU=group auto-inject and strictly-coarser PSU with within-group constancy) and routes them through the legacy group-level path so the bootstrap SE stays bit-identical to the previous release.
  • Under within-group-varying PSU a group contributing cells to multiple PSUs receives independent multiplier draws per PSU (correct Hall-Mammen wild PSU clustering at cell granularity). Multi-horizon bootstraps draw a single shared (n_bootstrap, n_psu) PSU-level weight matrix per block and broadcast per-horizon via each horizon's cell-to-PSU map, so the sup-t simultaneous confidence band remains a valid joint distribution.

Methodology references (required if estimator / math changes)

  • Method name(s): ChaisemartinDHaultfoeuille survey multiplier bootstrap (_compute_dcdh_bootstrap). Cell-level wild PSU Hall-Mammen extension; cohort-recentered per-cell IF U_centered_per_period (from PR Add cell-period IF allocator for dCDH survey variance #323).
  • Paper / source link(s): de Chaisemartin & D'Haultfœuille (2020, AER); (2022, NBER WP 29873, Web Appendix §3.7.3). Hall & Mammen (1996) wild bootstrap. The multiplier bootstrap itself is a library extension (the paper prescribes only the analytical plug-in variance); the cell-level allocator extension is documented as such in REGISTRY.md.
  • Any intentional deviations from the source (and why): Cell-level wild PSU bootstrap under within-group-varying PSU is a library convention extending the same post-period cell-level attribution shipped in PRs Add cell-period IF allocator for dCDH survey variance #323 and Extend dCDH heterogeneity SE to cell-period allocator #329. R DIDmultiplegtDYN does not support survey designs, so "deviation from R" does not apply.

Validation

  • Tests added/updated:
    • tests/test_survey_dcdh.py: flipped the bootstrap gating test to assert finite SE + CI under within-group-varying PSU at both the overall and per-horizon event-study surfaces. Added TestBootstrapCellPeriod with (a) pinned numeric baseline SE captured from pre-PR-4 code (origin/main at ac181b7f) as the bit-identity regression guard; (b) zero-weight-eligible-group smoke test; (c) three negative-contract tests asserting ValueError when psu_varies but the required per-cell tensor is missing or misshaped.
    • tests/test_dcdh_bootstrap_cell_period_coverage.py (new, @pytest.mark.slow): 500-rep MC coverage test validating empirical 95% coverage at both the overall bootstrap CI and the horizon-1 event-study bootstrap CI on a DGP with within-group-varying PSU. Horizon-specific assertion guards the shared-PSU-weight multi-horizon path.
  • Existing regression guards: test_auto_inject_bit_identical_to_group_level (auto-inject invariance), test_coarser_psu_produces_finite_se (strictly-coarser PSU), and the ATT + heterogeneity slow MC coverage tests all pass unchanged.

Security / privacy

  • Confirm no secrets/PII in this PR: Yes

igerber and others added 2 commits April 19, 2026 09:54
Lifts the last NotImplementedError gate in the dCDH survey contract:
n_bootstrap > 0 combined with within-group-varying PSU is now
supported via a cell-level Hall-Mammen wild multiplier bootstrap.

Approach: the bootstrap mixin dispatches on whether PSU varies across
cells of a group, inspecting a new (n_eligible, n_periods) PSU tensor
(sentinel -1 for zero-weight cells) built alongside the existing
group-level map. Under PSU-within-group-constant regimes (including
PSU=group auto-inject and strictly-coarser PSU with within-group
constancy) the dispatcher routes to the legacy group-level bootstrap
so the SE is bit-identical to the previous release. Under
within-group-varying PSU each (g, t) cell draws its multiplier via
the per-cell PSU map and a group contributing cells to multiple PSUs
receives independent multiplier draws per PSU — the correct
Hall-Mammen wild PSU clustering at cell granularity.

Multi-horizon bootstraps draw a single shared (n_bootstrap, n_psu)
PSU-level weight matrix once per block and broadcast per-horizon via
each horizon's cell-to-PSU map (via psu_codes_per_cell raveled and
sentinel-masked), so the sup-t simultaneous confidence band remains
a valid joint distribution across horizons. The cohort-recentered
per-horizon tensors U_centered_pp_l and U_centered_pp_pl_l are
persisted to _mh_pp_cache and _pl_pp_cache dicts inside the existing
analytical-SE loops so the bootstrap block can reuse them without
recomputing cohort-recentering.

Tests:
- Flip the bootstrap gating test to assert finite SE + CI under
  within-group-varying PSU at both the overall and per-horizon
  event-study surfaces (the multi-horizon path is the highest-risk
  surface for shared-weight regressions).
- Pin a numeric baseline SE captured from pre-PR-4 code
  (origin/main at SHA ac181b7) under PSU=group and assert bit-
  identity: test_bootstrap_se_matches_pre_pr4_baseline. Dispatcher
  regressions that silently drift away from the legacy path fail
  this guard.
- Add zero-weight-eligible-group smoke test.
- Add slow-tier MC coverage test
  (test_dcdh_bootstrap_cell_period_coverage.py) validating empirical
  95% coverage at both overall and horizon-1 surfaces on a DGP with
  within-group-varying PSU.

REGISTRY.md Survey + bootstrap contract Note rewritten to describe
the cell-level allocator, the dispatcher's bit-identity routing, and
the shared-PSU-weight multi-horizon machinery. Cluster contract Note
updated; no scope limitation remains in the Survey IF expansion Note.

Replicate-weight variance and n_bootstrap > 0 remain mutually
exclusive by SurveyDesign construction — no change there.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Addresses local AI review P1: the dispatcher previously silently fell
back to the legacy group-level bootstrap whenever `psu_varies=True`
but a target's per-cell IF tensor was `None`. Under within-group-
varying PSU that would silently under-cluster (collapsing multi-PSU
group contributions to one PSU).

Fix:
- Raise ValueError in every target branch (overall, joiners, leavers,
  multi-horizon, placebo-horizon) when `psu_varies=True` and the
  target's per-cell tensor is missing. Each raise names the target so
  the error localizes the missed wiring.
- Add shape-check in _unroll_target_to_cells: require
  `u_per_period_target.shape == psu_codes_per_cell.shape` before
  flattening, so a silent cell/PSU misalignment fails loudly.
- Add three negative-contract tests in TestBootstrapCellPeriod:
  varying PSU + missing overall tensor, shape mismatch, and varying
  PSU + missing multi-horizon tensor.
- Update stale `_strata_psu_vary_within_group` docstring (the helper
  no longer gates any out-of-scope combo at fit() time).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

Overall Assessment

⛔ Blocker

Executive Summary

  • Affected method: ChaisemartinDHaultfoeuille survey Hall-Mammen wild bootstrap, specifically the new cell-level PSU allocator in _compute_dcdh_bootstrap.
  • The high-level methodology choice is documented, not a paper/R mismatch: the bootstrap itself and the cell-level PSU extension are explicitly treated as library conventions in diff_diff/chaisemartin_dhaultfoeuille_bootstrap.py:L1-L39 and docs/methodology/REGISTRY.md:L652-L653.
  • Two new silent correctness bugs remain in the PSU-map plumbing. Both can change bootstrap SE/CI/p-values with no warning, so they are P0 blockers.
  • The added zero-weight smoke test is too weak: it only checks “finite” output, so it does not catch the new under-clustering fallback.
  • I did not run pytest in this container; this is a static review of the diff and surrounding code.

Methodology

Affected method: dCDH survey multiplier bootstrap. The papers cited in the module docstrings justify the analytical plug-in variance; this bootstrap path is a documented library extension, so I am reviewing it against the REGISTRY/bootstrap contract rather than treating the existence of the extension as a defect.

  • Severity: P0. Impact: In diff_diff/chaisemartin_dhaultfoeuille.py:L2333-L2391, psu_codes_per_cell_bootstrap is only populated inside the valid_map branch. If any variance-eligible group has no positive-weight cells, valid_map flips to False, the entire per-cell tensor is discarded, and _compute_dcdh_bootstrap() falls back to the legacy group-level path because psu_codes_per_cell=None in diff_diff/chaisemartin_dhaultfoeuille_bootstrap.py:L217-L229. That silently under-clusters bootstrap inference for the remaining positive-weight groups instead of merely dropping the all--1 row, which contradicts the documented sentinel-cell contract in docs/methodology/REGISTRY.md:L653. Concrete fix: always pass psu_codes_per_cell_bootstrap once the tensor is built, even if some rows are all -1; only the legacy group-level map needs per-row validity checks. Then add a regression that the existing zero-weight-group fixture matches the same fit with the zero-weight group removed.
  • Severity: P0. Impact: The new dense PSU coding is created from all positive-weight rows, not the post-filter eligible target subset, in diff_diff/chaisemartin_dhaultfoeuille.py:L2317-L2331. Because _eligible_group_ids is a filtered subset after the singleton-baseline exclusion in diff_diff/chaisemartin_dhaultfoeuille.py:L5360-L5467, the resulting group_id_to_psu_code_bootstrap can contain gaps. _generate_psu_or_group_weights() then uses max(code)+1 and takes the identity fast path whenever n_psu >= n_groups_target in diff_diff/chaisemartin_dhaultfoeuille_bootstrap.py:L692-L756. A target map like [1, 1] or [0, 2, 2] will therefore receive independent group draws even though eligible groups still share a PSU. That silently changes bootstrap SEs in the documented “legacy/coarser PSU” regime instead of remaining bit-identical. Concrete fix: densify group_id_to_psu_code_bootstrap on the eligible subset before passing it downstream, or re-factorize group_to_psu_map inside _generate_psu_or_group_weights() before counting/broadcasting.

Code Quality

No separate code-quality findings beyond the two statistical-correctness blockers above.

Performance

No material performance findings. The new Monte Carlo coverage test is marked slow, which is appropriate.

Maintainability

No separate maintainability finding beyond the PSU-code contract bug above; that bug is itself a sign that the dense-code invariant is currently split across too many call sites.

Tech Debt

  • Severity: P3. Impact: The only nearby dCDH survey item in TODO.md:L59-L60 tracks the broader “cell-period allocator is a library convention” derivation question, not these implementation regressions. Under the review rubric, neither P0 issue is mitigated by existing tracking. Concrete fix: fix the two correctness bugs before merge; only any extra non-blocking follow-up test hardening could be deferred into TODO.md.

Security

No findings.

Documentation/Tests

  • Severity: P2. Impact: The new zero-weight regression in tests/test_survey_dcdh.py:L1969-L2003 only asserts that the bootstrap SE stays finite. It therefore passes even when the code silently drops back to unclustered group-level weights. The PR also does not add a regression for the new gapped-code/coarser-PSU failure mode. Concrete fix: strengthen the zero-weight test to compare against the same fit with the zero-weight group removed, and add a coarser-PSU fixture where an ineligible filtered-out group leaves a gap in the eligible PSU codes while the remaining eligible groups still share a PSU.

Path to Approval

  1. Preserve the cell-level bootstrap tensor even when some eligible groups have no positive-weight cells; only exclude those rows at unroll time, not by disabling the entire PSU-aware path.
  2. Re-densify the eligible-subset PSU map before _generate_psu_or_group_weights() decides whether the identity fast path applies.
  3. Add two targeted regressions: one where an all-zero-weight eligible group is equivalent to dropping that group entirely, and one where filtered-out groups create gapped eligible PSU codes but the remaining eligible groups still share a PSU.

Addresses two P0 correctness regressions in the PR-4 bootstrap PSU-map
plumbing flagged by CI review.

**P0 #1 - valid_map gate discarded the per-cell tensor too eagerly.**
When any variance-eligible group had no positive-weight cells (all-
sentinel row in psu_codes_per_cell), the old code set valid_map=False
and left BOTH group_id_to_psu_code_bootstrap AND
psu_codes_per_cell_bootstrap as None. The bootstrap then silently
dropped to unclustered group-level instead of excluding only that
group's empty row. Fix: always populate psu_codes_per_cell_bootstrap
once the tensor is built; the cell-level path already masks out -1
cells at unroll time. Always populate group_id_to_psu_code_bootstrap
with a per-group code (use placeholder 0 for all-sentinel rows since
those groups have no IF mass and the multiplier they receive is
irrelevant on either the legacy or the cell-level path).

**P0 #2 - dense PSU codes factorized over non-eligible subset.**
`np.unique(obs_psu_codes[pos_mask_boot])` previously included PSU
labels from groups that were filtered out of _eligible_group_ids
(e.g., singleton-baseline-excluded groups). The excluded groups'
PSUs contributed dense codes that formed gaps in the eligible
subset's map. Downstream `_generate_psu_or_group_weights` computes
`n_psu = max(code) + 1` and triggers the identity fast path when
`n_psu >= n_groups_target`. A gapped map like `[1, 1]` or `[0, 2, 2]`
silently activated independent-draws clustering for eligible groups
that should have shared a multiplier. Fix: restrict the np.unique
factorization to the eligible-subset positive-weight obs only
(`elig_obs_mask = pos_mask_boot & (g_idx_arr >= 0) & (t_idx_arr >=
0)`), so the dense code domain exactly matches the PSUs actually
used by variance-eligible groups.

Tests:
- `test_bootstrap_zero_weight_group_equivalent_to_removing_it`:
  fit with vs without an all-zero-weight eligible group must
  produce byte-identical bootstrap SE at the same seed (byte-
  identity would have failed before P0 #1 fix because valid_map
  flipped the PSU-aware path off for the with-zero-group fit).
- `test_bootstrap_dense_codes_under_singleton_baseline_excluded_group`:
  spies on the group_id_to_psu_code dict passed to
  `_compute_dcdh_bootstrap` under a fixture with an always-treated
  singleton-baseline group and strictly-coarser PSU among eligible
  groups. Asserts the dict's values form a contiguous `[0,
  n_unique-1]` range (no gaps from the excluded group's PSU), and
  that eligible groups sharing a PSU label receive the same dense
  code.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 19, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: ee0cc544958eee05769816325c4567d83be3a32a


Overall Assessment

✅ Looks good

Executive Summary

  • Affected method: ChaisemartinDHaultfoeuille survey Hall-Mammen wild bootstrap, especially the new cell-level PSU dispatcher in _compute_dcdh_bootstrap.
  • Methodology-wise, the within-group-varying PSU bootstrap remains a documented library extension rather than an undocumented deviation from the dCDH papers or the registry contract.
  • The two prior blocker findings appear resolved: PSU codes are now densified on the eligible subset, and the bootstrap now hard-fails on missing cell tensors instead of silently degrading to the wrong path.
  • I did not find a remaining P0/P1 silent-correctness defect in the changed estimator/inference path.
  • Two non-blocking follow-ups remain: one misleading warning path still assumes one PSU per group, and the docs/tests are not fully updated to the new supported contract.
  • I could not run pytest here because this environment lacks pytest and numpy; this is a static review.

Methodology

Code Quality

  • Severity: P2. Impact: The pre-bootstrap warning prepass still collapses each eligible group to its first positive-weight PSU label in diff_diff/chaisemartin_dhaultfoeuille.py:L2111-L2148. Now that within-group-varying PSU is supported, that can emit a misleading “strictly coarser PSU” warning on supported cell-level designs. Concrete fix: derive the warning from the dense per-cell PSU map or suppress it whenever the design would take the cell-level path.

Performance

  • No findings.

Maintainability

  • No findings beyond the stale warning-path assumption above.

Tech Debt

  • Severity: P3. Impact: The remaining “cell-period allocator is a library convention, not a full observation-level derivation” limitation is already tracked in TODO.md:L60, so it is mitigated under the review rubric and does not block this PR. Concrete fix: None in this PR.

Security

  • No findings.

Documentation/Tests

  • Severity: P2. Impact: REGISTRY still says heterogeneity= plus n_bootstrap > 0 with within-group-varying PSU “still raises NotImplementedError” in docs/methodology/REGISTRY.md:L618, but that gate is no longer part of the fit-time contract. Concrete fix: update that note to match the current behavior, or explicitly clarify that heterogeneity inference remains analytical on bootstrapped fits.
  • Severity: P2. Impact: The new tests do not fully lock down the prior zero-weight varying-PSU fallback bug. The varying-PSU zero-weight test only checks finiteness in tests/test_survey_dcdh.py:L1969-L2003, while the equivalence regression that is meant to catch silent fallback uses psu=group in tests/test_survey_dcdh.py:L2005-L2058, where the buggy and intended paths collapse to the same identity-draw structure. Concrete fix: add an equivalence regression on a genuinely within-group-varying PSU fixture, comparing the zero-weight case to the same fit with that contribution removed.
  • Verification note: I could not execute the tests in this environment because pytest and numpy are unavailable.

…rying-PSU equivalence test

Addresses the three P2 findings from the CI re-review (all P0s cleared).

1. **Warning prepass assumed one PSU per group** (`chaisemartin_dhaultfoeuille.py:2111-2148`). The old code collected `labels[0]` per eligible group, so a within-group-varying PSU design was mis-counted as having one PSU per group and emitted a misleading "strictly-coarser PSU" UserWarning. Rewrite counts unique PSU labels across all positive-weight obs of eligible groups (not just the first label per group); under PSU=group unchanged, under varying-PSU no false warning.

2. **REGISTRY heterogeneity Note still claimed NotImplementedError** (`REGISTRY.md:618`, "Combining heterogeneity= with n_bootstrap > 0 and within-group-varying PSU still raises NotImplementedError"). That gate was removed in the current PR. Update to clarify that heterogeneity inference stays analytical when bootstrap runs on the main ATT surfaces — the two inference paths are independent.

3. **Zero-weight-equivalence test used `psu=group`** (`test_bootstrap_zero_weight_group_equivalent_to_removing_it`). Under PSU=group both the buggy and correct code paths collapse to the same identity-draw structure, so the test didn't actually exercise the P0 #1 regression. Switch the fixture to within-group-varying PSU (period parity per group) so the cell-level dispatcher is invoked and the before-fix silent-dropback bug would fail this test.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 19, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: c930f74b2db1927747cc768674f7b9f73204ffbf


Overall Assessment

⚠️ Needs changes

Executive Summary

Methodology

  • Severity: P1. Impact: The new cell-level bootstrap path is incorrect on a supported edge case: terminal missingness. Raw per-period IF contributions are zeroed on missing transitions via present, which is how the estimator documents support for terminally-missing groups. But _cohort_recenter_per_period() then subtracts cohort means column-by-column across the full period grid, so a group with no observed cell at period t can still acquire non-zero centered mass in that missing column. The new bootstrap path builds psu_codes_per_cell with -1 sentinels for missing/zero-weight cells and _unroll_target_to_cells() drops those entries entirely, so that centered mass is silently discarded before the bootstrap draw. The old NotImplementedError gate prevented this regime; the PR newly enables it for within-group-varying PSU, so bootstrap inference can now be wrong on supported panels with early exit/right-censoring. Concrete fix: before advertising full support, either (a) reject the cell-level bootstrap whenever non-zero U_centered_per_period lands on a psu_codes_per_cell == -1 cell, or (b) redesign the recentering/unroll contract so centered mass is defined only on observed cells and add a regression for a within-group-varying-PSU panel with terminal missingness. diff_diff/chaisemartin_dhaultfoeuille.py:L5161-L5205, diff_diff/chaisemartin_dhaultfoeuille.py:L5233-L5268, diff_diff/chaisemartin_dhaultfoeuille.py:L2354-L2369, diff_diff/chaisemartin_dhaultfoeuille_bootstrap.py:L236-L254, diff_diff/chaisemartin_dhaultfoeuille_bootstrap.py:L596-L629, docs/methodology/REGISTRY.md:L610-L653

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings beyond the methodology blocker above.

Tech Debt

  • Severity: P3. Impact: The broader “cell-period allocator is a library convention, not a formal observation-level derivation” limitation remains properly documented and tracked, so I did not count that general issue against this PR. Concrete fix: None required here. TODO.md:L51-L60, docs/methodology/REGISTRY.md:L652-L653

Security

No findings.

Documentation/Tests

  • Severity: P2. Impact: The new Monte Carlo test overclaims its coverage. Its docstring says it guards the shared multi-horizon PSU-weight path and the sup-t joint distribution, but the fit uses L_max=1, so there is only one horizon and no cross-horizon sharing to validate. A regression in the newly added shared-weight broadcast logic could still ship undetected. Concrete fix: run the slow MC fixture with L_max >= 2 and assert a genuinely multi-horizon surface, e.g. horizon-2 coverage and/or finite cband_crit_value. tests/test_dcdh_bootstrap_cell_period_coverage.py:L11-L23, tests/test_dcdh_bootstrap_cell_period_coverage.py:L87-L130

Verification note: static review only; pytest and numpy are not installed in this environment.

Path to Approval

  1. Fix the within-group-varying-PSU bootstrap so non-zero recentered IF mass cannot be dropped on sentinel/missing cells, or restore a targeted gate for that case.
  2. Add a regression covering a survey panel with within-group-varying PSU plus terminal missingness (or later zero-weight cells) and assert the corrected behavior or the targeted error.

…entinel cells

**P1 (methodology):** Under terminal missingness,
`_cohort_recenter_per_period` subtracts cohort column means across
the full period grid, so a group with no observation at period t
acquires non-zero centered mass at that cell. The PR-4 cell-level
bootstrap builds `psu_codes_per_cell` with -1 sentinels for such
cells and `_unroll_target_to_cells` drops them — silently losing
that centered mass. Under within-group-varying PSU + terminal
missingness, this would under-cluster the bootstrap SE/CI/p-values.

Conservative guard: `_unroll_target_to_cells` now raises
`ValueError` when any sentinel cell (-1 PSU) carries non-zero
cohort-recentered IF mass (|u| > 1e-12). The error message points
users to `n_bootstrap=0` for analytical TSL on such panels. The
analytical path has the same mass-leakage behavior under this
regime but was shipped in PR #323; documenting the bootstrap-
specific guard here avoids advertising a broken combination.

Regression test: `test_bootstrap_cell_level_raises_on_sentinel_mass_leak`
constructs a per-cell IF tensor with non-zero mass at a -1-PSU cell
and asserts `_compute_dcdh_bootstrap` raises with the documented
error message.

**P2 (tests):** The slow MC bootstrap coverage test previously ran
at `L_max=1`, which collapsed the multi-horizon block to a single
target and never exercised the cross-horizon shared-weight path
described in its own docstring. Bumped to `L_max=2` so the shared
(n_bootstrap, n_psu) PSU-level weight matrix is drawn once and
broadcast across horizons via each horizon's cell-to-PSU map.
Added three assertions:

- Horizon-1 bootstrap CI coverage in [0.925, 0.975].
- Horizon-2 bootstrap CI coverage in [0.910, 0.975]. Tolerance is
  wider than h-1 because finite-sample analytical TSL coverage on
  this DGP is itself ~0.93 at l=2 (measured offline: analytical
  h-1 = 0.94, h-2 = 0.926 at n_groups=40). An observed bootstrap
  coverage within 1pp of the analytical baseline is consistent
  with correct clustering; a drop to ≤ 0.90 would indicate a
  real shared-weight broadcast regression.
- `cband_crit_value` finite in ≥ 90% of reps — validates that the
  shared (n_bootstrap, n_psu) weight matrix produces a coherent
  joint distribution across horizons (required for a valid sup-t
  simultaneous band).

Bumped n_bootstrap to 1000 (from 500) to keep internal bootstrap
MC noise below ~0.3pp per CI endpoint at horizon-2's slightly
wider percentile-CI spread.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 19, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 9ebb682d61e8a3076b05245827c775458860005f


Overall Assessment

⚠️ Needs changes

Executive Summary

Methodology

  • Severity: P1. Impact: The implementation correctly replaces the previous silent-underclustering bug with an explicit failure, but the PR’s documented contract is now inconsistent with the code. The registry still says terminal missingness is retained, while the new bootstrap helper raises on the terminal-missingness/sentinel-mass case; at the same time, the changelog and fit() docs claim within-group-varying PSU bootstrap support is now generally available. That is an undocumented support carve-out in a variance/inference path. Concrete fix: either implement a bootstrap-compatible treatment of recentered mass on missing cells, or document this limitation everywhere the PR currently claims full support and explicitly direct users to n_bootstrap=0 for that regime. docs/methodology/REGISTRY.md:621 docs/methodology/REGISTRY.md:664 CHANGELOG.md:13 diff_diff/chaisemartin_dhaultfoeuille_bootstrap.py:609
  • Severity: P3. Impact: The new cell-level wild PSU bootstrap is a documented library extension beyond the dCDH papers’ analytical variance formula, so I did not count the extension itself as a methodology defect. Concrete fix: None. diff_diff/chaisemartin_dhaultfoeuille_bootstrap.py:5 docs/methodology/REGISTRY.md:663

Code Quality

  • No findings.

Performance

  • No findings.

Maintainability

  • No findings.

Tech Debt

  • Severity: P3. Impact: The broader “survey cell-period allocator is a library convention, not a full observation-level derivation” limitation remains tracked in TODO.md and documented in REGISTRY.md, so it does not count against this PR. Concrete fix: None required in this PR. TODO.md:60 docs/methodology/REGISTRY.md:663

Security

  • No findings.

Documentation/Tests

  • Severity: P2. Impact: The new tests cover the happy-path varying-PSU bootstrap and the helper-level sentinel-mass guard, but there is still no end-to-end fit() regression for within-group-varying PSU plus terminal missingness. The user-facing failure mode is therefore unpinned even though the public contract changed. Concrete fix: add a fit(..., survey_design=sd, n_bootstrap>0) regression on a panel with later-period missing cells and varying PSU, asserting either the targeted ValueError or finite corrected bootstrap output. tests/test_survey_dcdh.py:1176 tests/test_survey_dcdh.py:1938 tests/test_dcdh_bootstrap_cell_period_coverage.py:92

Path to Approval

  1. Make the public contract consistent with the implementation: either support terminal-missing panels on the new cell-level bootstrap path, or document the limitation in REGISTRY.md, CHANGELOG.md, and the fit()/test docstrings that currently claim full support.
  2. Add an end-to-end regression for fit(..., survey_design=sd, n_bootstrap>0) on a within-group-varying-PSU panel with terminal missingness, asserting the chosen contract.

Verification note: static review only; pytest and numpy are unavailable in this environment.

…ession

Addresses the CI round-3 P1 (docs overclaiming support relative to the
newly-added `_unroll_target_to_cells` guard) and P2 (no end-to-end
fit() regression for terminal missingness + varying PSU + bootstrap).

**P1 (docs contract consistency):**

The cell-level bootstrap now hard-fails on the terminal-missingness
mass-leak regime, but the surrounding docs still advertised full
support. Added a "Bootstrap + terminal-missingness scope note" to:

- `REGISTRY.md` "terminal missingness retained" Note — describes
  the cohort-recentering leakage mechanism and directs users to
  `n_bootstrap=0` for affected panels.
- `REGISTRY.md` Survey + bootstrap contract Note — same carve-out,
  also clarifies that PSU-within-group-constant regimes are
  unaffected (dispatcher routes to the legacy path).
- `CHANGELOG.md` (PR-4 entry) — explicit scope note after the
  "closes the last NotImplementedError gate" claim.
- `fit()` docstring `survey_design` paragraph — scope note directs
  users to `n_bootstrap=0` as the documented workaround.

**P2 (end-to-end fit() regression):**

Added `test_bootstrap_fit_raises_on_terminal_missingness_with_varying_psu`
in TestBootstrapCellPeriod. Fixture: 10 groups with joiners cohort
(at period 3), leavers cohort (at period 4), and never-treated
controls; group 2 is terminally missing at periods 4-5. At period 4
the other joiners serve as stable_1 controls for the leavers,
producing non-zero cohort mean in cohort A — `_cohort_recenter_per_period`
leaks `-col_mean` onto group 2's missing cell. Varying PSU
(period parity per group) routes the bootstrap to the cell-level
path. Test asserts:
- `fit(..., n_bootstrap=50)` raises `ValueError` with the
  documented "no positive-weight observations" message.
- `fit(..., n_bootstrap=0)` succeeds on the same panel — analytical
  TSL supports this regime (the contract the scope note preserves).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 19, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: def65032e3086d1eaa36e073618ea41b577fbc4d


Overall Assessment

⛔ Blocker

Executive Summary

Methodology

Code Quality

  • No findings.

Performance

  • No findings.

Maintainability

  • No findings.

Tech Debt

  • No findings. The existing survey cell-period allocator TODO does not mitigate the P0 above, because silent variance/SE correctness issues are never deferrable. TODO.md:L60-L60

Security

  • No findings.

Documentation/Tests

  • Severity: P2 [Newly identified]. Impact: The new user-facing text and regression now codify the unsupported analytical fallback rather than guarding it. CHANGELOG.md, the fit() docstring, and test_bootstrap_fit_raises_on_terminal_missingness_with_varying_psu all tell users to expect finite analytical output on the same regime where the shared survey IF expansion loses mass. Concrete fix: once the analytical path is either fixed or guarded, update those docs and replace the finite-SE assertion with a regression for the chosen contract (ValueError or all-NaN inference plus warning). CHANGELOG.md:L13-L13 diff_diff/chaisemartin_dhaultfoeuille.py:L669-L675 tests/test_survey_dcdh.py:L2105-L2180

Path to Approval

  1. Add an analytical-path guard or a mathematically valid fix for missing-cell cohort-centered mass under terminal missingness + within-group-varying PSU in _survey_se_from_group_if / _compute_se.
  2. Remove the current n_bootstrap=0 workaround language from the changelog, registry, and fit() docstring unless/until that analytical path is actually valid on this regime.
  3. Replace the new analytical fallback regression with one that asserts the final guarded behavior on the same fixture.

Static review only; numpy/pytest are unavailable in this environment, so I could not execute the new tests.

Addresses the P0 escalation: recommending `n_bootstrap=0` as a
workaround for terminal missingness + within-group-varying PSU was
incorrect — the analytical TSL path uses the SAME cell-period
allocator and has the same silent mass-drop bug.

**Analytical guard parity.** `_survey_se_from_group_if` now computes
`W_cell` (per-(g,t) weight totals) and, before the cell-to-obs
expansion, checks whether any cell has `W_cell == 0` while the
corresponding cohort-recentered IF mass is non-zero. If so, raises
a targeted `ValueError` mirroring the bootstrap-side
`_unroll_target_to_cells` guard. The message text is aligned
across the two paths (same "no positive-weight observations"
phrasing) so the regression test matches both.

**Docs cleanup.** Removed the "use n_bootstrap=0 as workaround"
language from REGISTRY, CHANGELOG, and the fit() docstring. Replaced
with the correct workaround: pre-process the panel (drop late-exit
groups / trim to a balanced sub-panel), or use an explicit
`psu=<group_col>` so the dispatcher routes through the legacy
group-level path (which does not use the cell-period allocator and
is not affected by the mass-leak).

**Regression test update.** The end-to-end fit() regression now
asserts `ValueError` on BOTH `n_bootstrap=0` and `n_bootstrap > 0`
under the terminally-missing + within-group-varying PSU fixture.

This is technically a behavior change for panels previously covered
silently by PR #323's cell-period analytical allocator — those
panels used to produce finite (but silently mass-dropped) SEs and
now raise. The change closes a real silent-correctness bug; the
analytical path never had a principled treatment for the leaked
mass in the first place.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 19, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 262fc6160bd491c51c24726d03dadc412acae548


Overall Assessment
Needs changes

Executive Summary

Methodology

  • Severity: P1 [Newly identified]. Impact: the affected method is the survey-design variance path for ChaisemartinDHaultfoeuille, specifically the analytical TSL branch that now shares the cell-period allocator contract with the new cell-level PSU bootstrap. The code now documents that only terminal missingness plus within-group-varying PSU is unsupported, and that constant-PSU regimes should remain on the legacy group-level analytical sum. But that dispatcher does not exist on the analytical side: fit() always passes the per-cell tensor into _compute_se(), and _survey_se_from_group_if() always enables the cell allocator when that tensor is present, then raises on any leaked mass. The result is a production error on a documented supported regime, not just on the intended varying-PSU carve-out. Concrete fix: branch the analytical SE path on PSU-within-group constancy and use the legacy group-level allocator whenever PSU is constant within group; keep the new leaked-mass ValueError only on the true within-group-varying-PSU cell allocator path. diff_diff/chaisemartin_dhaultfoeuille.py:L1985-L2056 diff_diff/chaisemartin_dhaultfoeuille.py:L5753-L5923 diff_diff/chaisemartin_dhaultfoeuille.py:L669-L680 docs/methodology/REGISTRY.md:L621-L664
  • Severity: P3. Impact: the cell-level wild PSU bootstrap is explicitly documented as a library extension, not something prescribed by the dCDH papers or by an R reference implementation. Concrete fix: none. diff_diff/chaisemartin_dhaultfoeuille_bootstrap.py:L19-L39 docs/methodology/REGISTRY.md:L664-L664

Code Quality

  • No findings.

Performance

  • No findings.

Maintainability

  • No findings.

Tech Debt

  • No new untracked findings. The remaining “cell-period allocator is a library convention, not a formal observation-level derivation” limitation is already tracked in TODO.md and documented in the Registry, so it is mitigated under this repo’s review rules. TODO.md:L60-L60

Security

  • No findings.

Documentation/Tests

  • Severity: P2. Impact: the new tests exercise the intended varying-PSU success path and the varying-PSU terminal-missingness failure path, but they do not add the companion regression for the constant-PSU terminal-missingness contract that the docs now advertise. That is why the over-scoped analytical guard above can slip through. Concrete fix: add a survey regression with terminal missingness plus psu="group" and another with a strictly coarser within-group-constant PSU, and assert the analytical fit succeeds with finite SE or matches the legacy group-level analytical path. tests/test_survey_dcdh.py:L1176-L1208 tests/test_survey_dcdh.py:L2105-L2187 docs/methodology/REGISTRY.md:L621-L664
  • Severity: P3. Impact: _unroll_target_to_cells() still tells users to “use n_bootstrap=0” as the fallback when leaked mass hits a sentinel cell, but this PR’s own analytical guard means that is no longer the right workaround for the varying-PSU carve-out. Concrete fix: update the helper docstring and ValueError text to match the final supported contract. diff_diff/chaisemartin_dhaultfoeuille_bootstrap.py:L609-L664

Path to Approval

  1. Add an analytical dispatcher/fallback so terminal-missing survey panels with PSU constant within group use the legacy group-level allocator instead of the cell allocator.
  2. Add regression tests for terminal missingness under psu="group" and under a strictly coarser within-group-constant PSU, alongside the existing varying-PSU ValueError regression.
  3. Update _unroll_target_to_cells() runtime/docs text so it no longer recommends n_bootstrap=0 as the workaround on the leaked-mass varying-PSU path.

Static review only. numpy, pandas, and pytest are unavailable in this environment, so I could not execute the added tests.

The round-5 analytical guard in _survey_se_from_group_if was over-
scoped: it fired on any panel with leaked centered mass, including
PSU-within-group-constant regimes (PSU=group auto-inject, strictly-
coarser-PSU-within-group-constant). But the docs said those regimes
are the intended workaround. Panels with terminal missingness and
PSU=group are the documented supported path and must not raise.

Fix: add an analytical dispatcher inside `_survey_se_from_group_if`
mirroring the bootstrap's `_psu_varies_within_group` routing. When
PSU is within-group-constant on the eligible subset, fall back to
the legacy group-level allocator (which uses U_centered[g] directly
via the row-sum identity and does not trigger the sentinel-mass
guard). Only when PSU actually varies within group does the cell
allocator run — and only then can the sentinel-mass guard fire.

Byte-identity: under PSU=group the row-sum identity makes the cell
and group allocators statistically equivalent, but the legacy
branch was the one in use before PR #323 introduced the cell
allocator. Rerouting to it under within-group-constant regimes is
a regression-free fallback (it's what PR #323 aimed to preserve
via byte-identity in the first place).

Test coverage: added `test_fit_succeeds_on_terminal_missingness_with_psu_group`
— same fixture as the varying-PSU failure regression but with
auto-inject `psu=<group>`, asserts both `n_bootstrap=0` and
`n_bootstrap > 0` succeed with finite SE.

Also updated the cell-level bootstrap `ValueError` text (and the
_unroll_target_to_cells docstring) to no longer advertise
`n_bootstrap=0` as a workaround — both paths now fail consistently
on the varying-PSU carve-out, and the correct workaround is
pre-processing or using an explicit `psu=<group_col>`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 19, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 91eb2c7733cd29bc8ebe7b88e560f2b2d80ad166


Overall Assessment

⚠️ Needs changes

Executive Summary

Methodology

  • Severity: P1. Impact: affected methods are the replicate-weight analytical SE paths for the main ChaisemartinDHaultfoeuille ATT surfaces that flow through _survey_se_from_group_if() (DID_M, DID_+, DID_-, DID_l, and dynamic placebos). The new fallback at _survey_se_from_group_if() now disables the cell allocator before the helper dispatches to compute_replicate_if_variance(). Because replicate designs do not carry PSU labels here, that branch forces all replicate fits onto the legacy group-level allocator. Replicate variance is observation-level and allocator-sensitive when replicate ratios vary within group, so this silently changes SEs/inference on supported designs. That conflicts with the current ATT survey IF contract still documented in the registry/helper text. Concrete fix: make the new constant-PSU fallback TSL-only, e.g. skip it when resolved.uses_replicate_variance is true, or move the replicate-vs-TSL dispatch ahead of the fallback so replicate fits retain their intended allocator. If the intended methodology is actually to move Class A replicate sites to the legacy allocator, that needs an explicit REGISTRY/docstring/CHANGELOG update plus dedicated regression coverage before merge. diff_diff/chaisemartin_dhaultfoeuille.py#L5773 diff_diff/chaisemartin_dhaultfoeuille.py#L5866 diff_diff/chaisemartin_dhaultfoeuille.py#L5988 docs/methodology/REGISTRY.md#L663 tests/test_survey_dcdh.py#L1729
  • Severity: P3. Impact: the new cell-level wild PSU bootstrap is a documented library extension, not a paper-mandated or R-parity behavior, so I am not treating that extension itself as a defect. Concrete fix: none. diff_diff/chaisemartin_dhaultfoeuille_bootstrap.py#L19 docs/methodology/REGISTRY.md#L664

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

  • Severity: P3. Impact: the broader “cell-period survey IF expansion is a library convention, not a formal observation-level derivation” limitation remains already tracked and documented; this PR does not make that debt worse. Concrete fix: none required in this PR. TODO.md#L60 docs/methodology/REGISTRY.md#L663

Security

No findings.

Documentation/Tests

  • Severity: P2. Impact: the new tests cover the intended varying-PSU bootstrap path and the repaired constant-PSU terminal-missingness contract, but they still do not lock the Class A replicate allocator choice. Existing replicate tests mostly assert finiteness / df propagation, and the only allocator-sensitivity regression is for heterogeneity, not the main ATT surfaces. That left the P1 above unguarded. Concrete fix: add a Class A replicate regression that uses within-group-varying replicate ratios and asserts the intended allocator contract for at least overall_se and one horizon/placebo surface; the heterogeneity non-invariance fixture is a good template. tests/test_survey_dcdh.py#L1808 tests/test_survey_dcdh.py#L1729 tests/test_survey_dcdh_replicate_psu.py#L147

Path to Approval

  1. Change _survey_se_from_group_if() so the new constant-PSU fallback only applies on Binder/TSL fits, not on replicate-weight fits.
  2. Add a replicate-weight regression for the Class A ATT surfaces using within-group-varying replicate ratios, and assert the intended SE/inference contract explicitly.
  3. If the allocator change for Class A replicate fits is intentional, update the REGISTRY/helper docs to state that methodology change explicitly before merge.

Static review only: numpy, pandas, and pytest are not installed in this environment, so I could not execute the added tests.

Round-6's constant-PSU fallback in `_survey_se_from_group_if`
silently disabled the cell-period allocator for replicate-weight
designs, because replicate designs have `resolved.psu is None` and
the fallback routes `psu_arr is None` to the legacy group-level
path. That's an allocator change on a Class A surface (overall
DID_M, joiners, leavers, dynamic placebos) under per-row-varying
replicate ratios, where cell and legacy allocators produce
materially different Rao-Wu variances (same non-invariance PR #329
established for heterogeneity).

Fix: restrict the round-6 fallback to the TSL branch by gating on
`not resolved.uses_replicate_variance`. Replicate designs retain
the cell-period allocator (the PR #323 Class A contract), and the
sentinel-mass guard still fires on mass leakage when it applies.

Regression: `TestReplicateClassA::test_att_cell_allocator_with_varying_replicate_ratios`
constructs legacy and cell-level psi_obs on the fitted replicate
design, feeds both through `compute_replicate_if_variance`, and
asserts they produce materially different variances — locking the
allocator contract so a future refactor that switches Class A to
the legacy allocator would be detectable. Mirrors the heterogeneity
non-invariance guard from PR #329's CI review round.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 19, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 8c642cee13a7df816c0701b90ab32a295891fea8


Overall Assessment
⚠️ Needs changes.

Executive Summary

  • The prior re-review P1 on Class A replicate routing appears fixed in code: _survey_se_from_group_if() now skips the new within-group-constant PSU fallback when uses_replicate_variance is true, so replicate ATT surfaces keep the cell allocator. diff_diff/chaisemartin_dhaultfoeuille.py:L5866-L5912
  • One P1 remains: the new sentinel-mass guard now executes before the replicate/TSL dispatch, so replicate-weight ATT fits inherit the new terminal-missingness ValueError, but the updated public contract still documents that carve-out only for within-group-varying PSU / analytical TSL / bootstrap paths. diff_diff/chaisemartin_dhaultfoeuille.py:L5889-L6004 docs/methodology/REGISTRY.md:L621-L664 CHANGELOG.md:L12-L13
  • The cell-level Hall-Mammen bootstrap extension itself is documented in the Methodology Registry as a library extension, so I am not treating the extension or the group-vs-cell dispatcher choice as a defect. docs/methodology/REGISTRY.md:L663-L664
  • Test coverage for varying-PSU bootstrap improved materially, but the new replicate regression still does not actually pin the fitted ATT allocator contract; it only shows two synthetic allocators differ. tests/test_survey_dcdh_replicate_psu.py:L173-L287
  • Static review only: numpy/pandas are unavailable in this environment, so I could not execute the added tests.

Methodology

  • Severity: P1. Impact: _survey_se_from_group_if() now raises the new sentinel-mass ValueError before it branches to compute_replicate_if_variance(), and the new fallback explicitly preserves the cell allocator on replicate designs. That means terminally-missing panels can now fail under replicate-weight ATT variance too, but the updated REGISTRY, changelog, and fit() docs still scope the new limitation to within-group-varying PSU / analytical TSL / bootstrap. This is an undocumented contract change on a supported variance path. Concrete fix: either document the replicate-weight limitation in the public contract (REGISTRY.md, CHANGELOG.md, and the fit() docstring) and add a matching regression, or move the guard behind the replicate branch if replicate fits are supposed to retain the prior behavior. diff_diff/chaisemartin_dhaultfoeuille.py:L665-L684 diff_diff/chaisemartin_dhaultfoeuille.py:L5889-L6004 docs/methodology/REGISTRY.md:L621-L664 CHANGELOG.md:L12-L13
  • Severity: P3. Impact: The cell-level wild PSU bootstrap is a documented library extension beyond the dCDH papers / R reference, so the extension itself is not a methodology defect. Concrete fix: none. docs/methodology/REGISTRY.md:L663-L664

Code Quality

  • No findings.

Performance

  • No findings.

Maintainability

  • No findings.

Tech Debt

  • Severity: P3. Impact: The broader survey cell-period allocator convention remains explicitly tracked in TODO.md; this PR does not make that already-documented debt worse. Concrete fix: none required in this PR. TODO.md:L51-L60

Security

  • No findings.

Documentation/Tests

  • Severity: P2. Impact: test_att_cell_allocator_with_varying_replicate_ratios claims to lock the repaired Class A replicate contract, but it never compares the fitted overall_se**2 to either reconstructed allocator variance; it only proves two synthetic allocator constructions differ. A future refactor could reintroduce the prior replicate-routing bug and this test would still pass. Concrete fix: capture the fitted ATT variance path and assert that the fitted overall_se**2 matches the intended cell allocator and differs from the legacy group allocator on the same fixture. tests/test_survey_dcdh_replicate_psu.py:L173-L287

Path to Approval

  1. Decide the intended terminal-missingness contract for replicate-weight ATT variance, then make code and docs agree. If the new behavior is intentional, update docs/methodology/REGISTRY.md, CHANGELOG.md, and the fit() docstring to say the guard applies to replicate designs too.
  2. Add a replicate-weight regression on a terminally-missing panel that asserts the intended behavior explicitly.

…ss A regression

**P1 (docs scope carve-out missing for replicate ATT):** the
sentinel-mass `ValueError` fires BEFORE the replicate-vs-TSL
dispatch inside `_survey_se_from_group_if`, so replicate-weight ATT
fits (which always use the cell-period allocator per the PR #323
Class A contract) also raise on terminal missingness. The docs
previously scoped the carve-out to "within-group-varying PSU /
analytical TSL / bootstrap" and silently omitted the replicate
path. Rewrite the scope note in REGISTRY.md (both the balanced-
baseline Note and the Survey + bootstrap contract Note), CHANGELOG,
and the fit() docstring to list all three affected paths explicitly:
Binder TSL with within-group-varying PSU, Rao-Wu replicate ATT,
and the cell-level wild PSU bootstrap. Clarify that the
`psu=<group_col>` workaround applies ONLY to Binder TSL — replicate
ATT and the varying-PSU bootstrap have no allocator fallback; the
only workaround for those paths is pre-processing the panel to
remove terminal missingness.

**P2 (synthetic allocator test didn't pin fitted SE):** the
previous `test_att_cell_allocator_with_varying_replicate_ratios`
showed that synthetic cell and legacy allocators differ, but never
tied the difference back to the fit's `overall_se`. A refactor
switching Class A replicate to the legacy allocator could have
slipped through unnoticed. Rewrite using a pytest monkeypatch spy
on `compute_replicate_if_variance` to capture the psi_obs the fit
actually passes, then reconstruct the legacy-equivalent psi_obs via
the row-sum identity on the SAME captured `U_centered` and verify:

- fit's `overall_se**2` equals the recomputed variance on the
  captured cell-allocator psi_obs (sanity / dispatch alignment);
- the legacy-reconstructed psi_obs produces a materially different
  Rao-Wu variance on the same fixture — so a silent Class A
  allocator regression would flip overall_se and fail this guard.

No code behavior changed in round-8 beyond the doc updates — only
the test and documentation now lock the contract explicitly.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 19, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 785cb53a4f515cdd087492223adb94b4b4699f8d


Overall Assessment

✅ Looks good — no unmitigated P0 or P1 findings.

Executive Summary

  • The prior P1 is resolved: the new terminal-missingness contract is now documented consistently in the fit() docstring, the Methodology Registry, and the changelog, and the replicate ATT path still reaches the guard because the new Binder fallback explicitly skips replicate designs. chaisemartin_dhaultfoeuille.py:661 REGISTRY.md:621 CHANGELOG.md:13 chaisemartin_dhaultfoeuille.py:5894
  • The prior replicate-regression concern is materially addressed: the new spy-based test now pins the fitted ATT variance path to the actual psi_obs passed into compute_replicate_if_variance() and checks that the legacy allocator would differ on the same fixture. test_survey_dcdh_replicate_psu.py:173
  • Methodology-wise, the affected code paths are _compute_dcdh_bootstrap() and _survey_se_from_group_if(). I did not find an undocumented estimator or variance mismatch: the cell-level Hall-Mammen PSU bootstrap is explicitly documented as a library extension, and PSU-within-group-constant cases still route through the legacy path for bit-identity. chaisemartin_dhaultfoeuille_bootstrap.py:91 chaisemartin_dhaultfoeuille_bootstrap.py:217 REGISTRY.md:669
  • One remaining P2 test gap: there is still no end-to-end replicate-weight regression for the newly documented terminal-missingness ValueError; current terminal-missingness tests cover analytical TSL and bootstrap only. test_survey_dcdh.py:2105 test_survey_dcdh_replicate_psu.py:173
  • Static review only: this environment does not have numpy, pandas, or pytest, so I could not execute the changed tests.

Methodology

Code Quality

No findings.

Performance

No findings.

Maintainability

  • Severity: P3. Impact: _compute_dcdh_bootstrap()’s docstring still describes 3-tuples and a purely per-group weight-matrix flow, but the helper now consumes 4-tuples plus cell-level per-period tensors and can bootstrap over unrolled cell-level PSU maps. That is only an internal-docs mismatch, but it makes the new contract harder to maintain safely. Concrete fix: update the helper docstring to describe the 4th tuple element and the cell-level dispatch path. chaisemartin_dhaultfoeuille_bootstrap.py:123

Tech Debt

  • Severity: P3. Impact: The broader survey cell-period allocator remains an accepted library convention rather than a paper-derived observation-level linearization; that limitation is already tracked in TODO.md and re-documented in the Registry, so it is non-blocking under the project’s review policy. Concrete fix: none required in this PR. TODO.md:60 REGISTRY.md:668

Security

No findings.

Documentation/Tests

  • Severity: P2. Impact: The new replicate test locks allocator choice on a regular panel, and the new terminal-missingness tests cover analytical TSL / bootstrap, but there is still no end-to-end regression asserting the documented replicate ATT failure mode on a terminally missing panel. A future refactor could relax or bypass that replicate error path without a direct test catching it. Concrete fix: add a replicate-weight fixture with a terminally missing group in a surviving cohort and assert that fit() raises the targeted ValueError on the replicate ATT path. chaisemartin_dhaultfoeuille.py:5894 REGISTRY.md:621 test_survey_dcdh.py:2105 test_survey_dcdh_replicate_psu.py:173

…string

Round-8 cleared all P0/P1 findings. Addressing the two remaining
P2/P3s with concrete fixes.

**P2 (tests/docs): replicate ATT terminal-missingness failure mode
lacked an end-to-end regression.** The round-8 docs extended the
carve-out to replicate ATT, but no fit()-level test asserted it.
Added `TestReplicateClassA::test_att_raises_on_terminal_missingness_replicate_path`:
fixture has a terminally-missing joiner group in a cohort whose
surviving peers serve as stable_1 controls at the missing period
(leaking non-zero centered IF mass onto the missing cells). With
SDR replicate weights and per-row-varying ratios, the fit must
raise the documented `ValueError` from `_survey_se_from_group_if`'s
sentinel-mass guard.

**P3 (maintainability): `_compute_dcdh_bootstrap` docstring was
stale.** The docstring still described 3-tuple inputs and a purely
per-group weight-matrix flow, even though PR 4 changed every
per-target input to a 4-tuple and added the cell-level dispatch
path. Rewrote the Parameters section to enumerate every kwarg with
correct shapes, and added explicit entries for the PR-4 additions
(multi_horizon_inputs, placebo_horizon_inputs, group_id_to_psu_code,
eligible_group_ids, u_per_period_overall, psu_codes_per_cell)
including the dispatcher's routing rule.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 19, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: b3ac313e4b96db9eaa884492761fb2e99030a983


Overall Assessment
✅ Looks good — no unmitigated P0/P1 findings.

Executive Summary

Methodology

Code Quality

  • No findings.

Performance

  • No findings.

Maintainability

Tech Debt

  • Severity: P3. Impact: The broader survey cell-period allocator remains a documented library convention rather than a paper-derived observation-level survey linearization; that limitation is already tracked in TODO.md and re-documented in REGISTRY.md, so it is non-blocking under the project’s review policy. Concrete fix: none in this PR. TODO.md:L59-L60 docs/methodology/REGISTRY.md:L668-L669

Security

  • No findings.

Documentation/Tests

…ch modes

Round-9 cleared every finding except one P3: the bootstrap helper's
opening summary still read as a purely per-group W @ u_centered /
divisor flow, which no longer matches the cell-level dispatch under
within-group-varying PSU.

Rewrote the introductory numbered steps to describe both paths
explicitly — (a) legacy group-level path under PSU-within-group-
constant regimes and (b) cell-level wild PSU path under within-
group-varying PSU — and cross-reference `_unroll_target_to_cells`,
the terminal-missingness guard, and the shared-PSU-weights multi-
horizon machinery. Also re-flagged the row-sum identity that makes
the two paths statistically equivalent under PSU=group.

No code behavior change. Targeted tests (269) all pass unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 19, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: ae5838576f471a6ae1481f7460a6c65d69c3045c


Overall Assessment

✅ Looks good. No unmitigated P0 or P1 findings.

Executive Summary

Methodology

Code Quality

  • No findings.

Performance

  • No findings.

Maintainability

Tech Debt

  • Severity: P3-informational. Impact: The broader survey cell-period allocator remains a documented library convention rather than a paper-derived observation-level survey linearization, but this limitation is already tracked in TODO.md and in the Methodology Registry, so it is mitigated under the project policy. Concrete fix: none required in this PR. See TODO.md:L59 and docs/methodology/REGISTRY.md:L668.

Security

  • No findings.

Documentation/Tests

The shared terminal-missingness ValueError in `_survey_se_from_group_if`
said "Analytical survey SE" and attributed the failure to "within-
group-varying PSU" — misleading for replicate ATT users, since
replicate designs carry no PSU structure and still hit the same
guard.

Branch the wording on `resolved.uses_replicate_variance`:
- Replicate ATT: "Rao-Wu replicate-weight ATT variance" + note
  that replicate ATT unconditionally uses the cell-period allocator
  per the Class A contract, PSU is not involved.
- Binder TSL: keeps the original "Analytical Binder TSL survey SE"
  wording + the within-group-varying-PSU trigger explanation.

Shared mechanic and pre-processing workaround kept the same. The
regression tests still match on "no positive-weight observations"
which appears in both branches.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 19, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: a520c5281a4f81971e84aa6108b94e22f3369a4f


Overall Assessment

✅ Looks good — no unmitigated P0/P1 findings.

Executive Summary

  • The affected methods are the dCDH survey/bootstrap inference paths: the bootstrap-input builder in [diff_diff/chaisemartin_dhaultfoeuille.py:L2302-L2432](/home/runner/work/diff-diff/diff-diff/diff_diff/chaisemartin_dhaultfoeuille.py#L2302), _compute_dcdh_bootstrap() / _unroll_target_to_cells() in [diff_diff/chaisemartin_dhaultfoeuille_bootstrap.py:L91-L744](/home/runner/work/diff-diff/diff-diff/diff_diff/chaisemartin_dhaultfoeuille_bootstrap.py#L91), and _survey_se_from_group_if() in [diff_diff/chaisemartin_dhaultfoeuille.py:L5772-L6009](/home/runner/work/diff-diff/diff-diff/diff_diff/chaisemartin_dhaultfoeuille.py#L5772).
  • Methodology-wise, I did not find an undocumented estimator or variance drift. The analytical Section 3.7.3 plug-in variance remains intact, while the cell-level wild PSU bootstrap and terminal-missingness rejection are explicitly documented library extensions in [docs/methodology/REGISTRY.md:L603-L669](/home/runner/work/diff-diff/diff-diff/docs/methodology/REGISTRY.md#L603).
  • The prior re-review P3 about the analytical/replicate terminal-missingness wording is resolved: _survey_se_from_group_if() now distinguishes Binder TSL from replicate-weight ATT in its ValueError text at [diff_diff/chaisemartin_dhaultfoeuille.py:L5961-L5998](/home/runner/work/diff-diff/diff-diff/diff_diff/chaisemartin_dhaultfoeuille.py#L5961).
  • Remaining issue 1 is P3 only: _unroll_target_to_cells() still suggests psu=<group_col> as a bootstrap workaround for terminal missingness, which contradicts the Registry’s “no allocator fallback” contract for the within-group-varying-PSU bootstrap.
  • Remaining issue 2 is P3 only: the new bootstrap warning heuristic is broader than the stated “strictly coarser PSU only” contract and can warn on supported varying-PSU designs with shared PSU labels.
  • Static review only: executable validation was not possible here because pytest, numpy, and pandas are unavailable in this environment.

Methodology

  • No findings. The changed inference surface matches the documented contract in [docs/methodology/REGISTRY.md:L621-L669](/home/runner/work/diff-diff/diff-diff/docs/methodology/REGISTRY.md#L621): analytical dCDH variance still follows the paper-backed plug-in path, while the within-group-varying-PSU bootstrap and the terminal-missingness guard are explicitly documented library extensions/deviations rather than undocumented methodology changes.

Code Quality

  • No findings.

Performance

  • No findings.

Maintainability

  • Severity: P3. Impact: the new warning block says the Hall-Mammen warning should fire only when PSU is “strictly coarser than group,” but the actual predicate is n_psu_eff_warn < n_groups_eff_warn, which can also fire for supported within-group-varying designs that reuse PSU labels across groups. That makes the code/comment/Registry contract internally inconsistent, even if the warning itself is harmless. See [diff_diff/chaisemartin_dhaultfoeuille.py:L2128-L2168](/home/runner/work/diff-diff/diff-diff/diff_diff/chaisemartin_dhaultfoeuille.py#L2128) and [docs/methodology/REGISTRY.md:L669](/home/runner/work/diff-diff/diff-diff/docs/methodology/REGISTRY.md#L669). Concrete fix: either suppress this warning on the varying-PSU cell-level path, or update the comment/Registry to describe the broader n_psu < n_groups heuristic instead of “strictly coarser PSU only.”

Tech Debt

  • No new findings. The broader limitation that the survey cell-period allocator is a documented library convention rather than a formal observation-level derivation remains tracked and mitigated in [TODO.md:L51-L60](/home/runner/work/diff-diff/diff-diff/TODO.md#L51) and [docs/methodology/REGISTRY.md:L668-L669](/home/runner/work/diff-diff/diff-diff/docs/methodology/REGISTRY.md#L668).

Security

  • No findings.

Documentation/Tests

  • Severity: P3. Impact: _unroll_target_to_cells() still tells users they can work around bootstrap terminal-missingness by switching to psu=<group_col>, but the Registry now says that fallback is only valid for analytical Binder TSL; for the within-group-varying-PSU bootstrap there is no allocator fallback and the panel must be pre-processed. That is a user-facing contract mismatch, not a numerical bug. See [diff_diff/chaisemartin_dhaultfoeuille_bootstrap.py:L690-L743](/home/runner/work/diff-diff/diff-diff/diff_diff/chaisemartin_dhaultfoeuille_bootstrap.py#L690) and [docs/methodology/REGISTRY.md:L621-L626](/home/runner/work/diff-diff/diff-diff/docs/methodology/REGISTRY.md#L621). Concrete fix: align the _unroll_target_to_cells() docstring and ValueError text with the Registry: keep the psu=<group_col> workaround only for the analytical Binder TSL path, and direct bootstrap users to pre-process the panel instead.

…= workaround text

**P3 #1 (warning predicate inconsistent with "strictly coarser PSU"
contract):** the new bootstrap warning block's comment said the
warning fires only on strictly-coarser PSU designs, but the
predicate `n_psu_eff_warn < n_groups_eff_warn` could also fire on
supported varying-PSU designs whose eligible groups happened to
share PSU labels across groups. Detect within-group-varying PSU
explicitly (`.groupby("g")["p"].nunique().gt(1).any()`) and
suppress the warning in that regime. Under auto-inject PSU=group
and under within-group-varying PSU the warning now stays silent,
matching the stated contract.

**P3 #2 (`_unroll_target_to_cells` suggested `psu=<group_col>` as a
bootstrap workaround):** the Registry / CHANGELOG already clarified
that `psu=<group_col>` is ONLY a Binder TSL workaround; the cell-
level wild PSU bootstrap has no allocator fallback. The helper's
docstring and `ValueError` message still advertised it as a
bootstrap-path workaround. Dropped that suggestion and explicitly
clarified: the varying-PSU bootstrap IS the cell-level path, so
there is no legacy-allocator alternative to fall back to —
pre-processing the panel is the only workaround on the bootstrap
side.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 19, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: b67c873ffa430f2db2c162179ea529f857613c75


Overall assessment

✅ Looks good — no unmitigated P0/P1 findings.

Executive summary

Methodology

Code Quality

Performance

  • Severity: None. Impact: I did not identify a performance regression serious enough to warrant changes in this PR. Concrete fix: None.

Maintainability

Tech Debt

  • Severity: P3-informational. Impact: The broader limitation that the survey cell-period allocator is a documented library convention rather than a formal observation-level derivation remains tracked in TODO.md:60 and documented in docs/methodology/REGISTRY.md:668, so it is mitigated and not a blocker for this PR. Concrete fix: None required in this PR.

Security

  • Severity: None. Impact: No security issues, secrets exposure, or unsafe handling of untrusted PR text were introduced by the changed code. Concrete fix: None.

Documentation/Tests

@igerber igerber added the ready-for-ci Triggers CI test workflows label Apr 19, 2026
igerber and others added 2 commits April 19, 2026 14:20
Recaptured the baseline on origin/main@ac181b7f (the stated reference
SHA). Current dispatcher reproduces this value bit-for-bit, confirming
the PSU-within-group-constant legacy-path routing is intact. The
prior pinned value was captured on an intermediate dev checkpoint
rather than the merged pre-PR-4 SHA.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Rust and pure-Python backends take different numeric paths through
`_generate_bootstrap_weights_batch` and `solve_ols`, so the pinned
pre-PR-4 SE differs between them. Both values were captured on
origin/main@ac181b7f under each backend; bit-identity holds within
each backend independently. Select the baseline at test time based
on the resolved backend (mirrors `_PURE_PYTHON_MODE` detection in
conftest.py).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@igerber igerber merged commit e834a48 into main Apr 19, 2026
19 checks passed
@igerber igerber deleted the feature/dcdh-bootstrap-cell-period branch April 19, 2026 20:31
igerber added a commit that referenced this pull request Apr 20, 2026
…-failures audit

Packages 161 commits across 18 PRs since v3.1.3 as minor release 3.2.0. Per
project SemVer convention, minor bumps are reserved for new estimators or new
module-level public API — BusinessReport / DiagnosticReport / DiagnosticReportResults
(PR #318) add a new public API surface and drive this bump.

Headline work:
- PR #318 BusinessReport + DiagnosticReport (experimental preview) - practitioner-
  ready output layer. Plain-English narrative summaries across all 16 result types,
  with AI-legible to_dict() schemas. See docs/methodology/REPORTING.md.
- PR #327, #335 did-no-untreated foundation - kernel infrastructure, local linear
  regression, HC2/Bell-McCaffrey variance, nprobust port. Foundation for the
  upcoming HeterogeneousAdoptionDiD estimator.
- PR #323, #329, #332 dCDH survey completion - cell-period IF allocator (Class A
  contract), heterogeneity + within-group-varying PSU under Binder TSL, and
  PSU-level Hall-Mammen wild bootstrap at cell granularity.
- PR #333 performance review - docs/performance-scenarios.md documents 5-7
  realistic practitioner workflows; benchmark harness extended.

Silent-failures audit closeouts (PRs #324, #326, #328, #331, #334, #337, #339)
continue the reliability work started in v3.1.2-3.1.3 across axes A/C/E/G/J.

CI infrastructure: PRs #330 and #336 exclude wall-clock timing tests from default
CI after false-positive flakes; perf-review harness is the principled replacement.

Version strings bumped in diff_diff/__init__.py, pyproject.toml, rust/Cargo.toml,
diff_diff/guides/llms-full.txt, and CITATION.cff (version: 3.2.0, date-released:
2026-04-19). CHANGELOG populated with Added / Changed / Fixed sections and the
comparison-link footer. CITATION.cff retains v3.1.3 versioned DOI in identifiers;
the v3.2.0 versioned DOI will be minted by Zenodo on GitHub Release and added in
a follow-up.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready-for-ci Triggers CI test workflows

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant